Optimal Stimulus and Noise Distributions for Information 
Transmission via Suprathreshold Stochastic Resonance 

Mark D. McDonnell, 1 fl Nigel G. Stocks, 2 and Derek Abbott 1 

1 School of Electrical and Electronic Engineering & Centre for Biomedical Engineering, 
The University of Adelaide, SA 5005, Australia 
^School of Engineering, The University of Warwick, 
Coventry CV4 7AL, United Kingdom 
(Dated: February 1, 2008) 

Abstract 

Suprathreshold stochastic resonance (SSR) is a form of noise enhanced signal transmission that 
occurs in a parallel array of independently noisy identical threshold nonlinearities, including model 
neurons. Unlike most forms of stochastic resonance, the output response to suprathreshold random 
input signals of arbitrary magnitude is improved by the presence of even small amounts of noise. 
In this paper the information transmission performance of SSR in the limit of a large array size is 
considered. Using a relationship between Shannon's mutual information and Fisher information, 
a sufficient condition for optimality, i.e. channel capacity, is derived. It is shown that capacity is 
achieved when the signal distribution is Jeffrey's prior, as formed from the noise distribution, or 
when the noise distribution depends on the signal distribution via a cosine relationship. These 
results provide theoretical verification and justification for previous work in both computational 
neuroscience and electronics. 
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I. INTRODUCTION 



The term 'stochastic resonance' describes the situation where a system's response to some 
signal is optimized by the presence of random noise, rather than its absence. It occurs in a 
wide variety of nonlinear physical [l] and biological {2} systems. 

In many of the systems and models in which stochastic resonance (SR) has been observed, 
the essential nonlinearity is a single static threshold, e.g. 0, 4, 5, 6]. It is generally thought 
that SR cannot occur in such systems for suprathreshold signals, meaning that the amplitude 
of the input signal needs to be restricted to values smaller than the amplitude of the threshold 
for SR to occur 0]. 

However, the 1999 discovery of a novel form of SR — known as suprathreshold stochastic 
resonance (SSR) — showed that this is not always true SSR occurs in an array of identical 
threshold nonlinearities, each of which are subject to independently random additive noise. 
We refer to this array as the SSR model — see Fig. HJ In this model SR occurs regardless of 
whether the input signal is entirely subthreshold or not. Furthermore, SSR occurs even for 
very large input SNRs. This is a further difference to conventional SR, for which the signal 
is required to be weak compared to the noise. 

SSR is a form of aperiodic stochastic resonance 0, 0, 10] that was first shown to occur by 



calculating Shannon's average mutual information for the SSR model [8j|. It was subsequently 
found that the performance achievable via SSR is maximized when all threshold values are set 
to the signal mean H[ , and that for sufficiently small input SNRs, modifying the thresholds 
in the model cannot improve information transfer [l^ . 

The SSR model was originally motivated as a model for parallel sensory neurons, such 
as those synapsing with hair cells in the inner ear [3]. Although the basic SSR model is 
non-dynamical, and does not model the many complexities of real neurons, each threshold 
nonlinearity is equivalent to a Pitts-McCulloch neuron model, and encapsulates the neural 
coding properties we are interested in — i.e. the generation of action potentials in response 
to a noisy aperiodic random stimulus. The small input SNRs we focus on are biologically 



relevant 



141 ]. particularly so for hair cells, which are subject to substantial Brownian mo- 



tion [151 ] . This leads to much randomness in the release of neurotransmitters at synapses 
with afferent neurons leading to the cochlear nucleus. 



Further justification of the SSR model's relevance to neural coding is discussed in 16, 
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17 



181 ]. and by extensions of the model to include more biologically realistic neural features. 



For example, the parallel array has been modified to consist of parallel FitzHugh-Nagumo 
neuron models [19|, leaky integrate-and-fire neuron models 



models 



16 



171 ] and Hodgkin-Huxley 



16J, and for the case of signal-dependent (multiplicative) noise 18|. In all cases the 



same qualitative results as for the simple threshold model were obtained. The SSR effect has 
also led to a proposal for improving the performance of cochlear implants for suprathreshold 



stimuli 



131 ]. based on the idea that the natural randomness present in functioning cochlear 



hair cells is missing in patients requiring implants 201 ] . 

The purpose of this paper is to analyze, in a general manner, the information theoretic 
upper limits of performance of the SSR model. This requires allowing the array size, N, to 
approach infinity. Previous work has discussed the scaling of the mutual information through 
the SSR model with N for specific cases, and found conditions for which the maximum 
mutual information — i.e. channel capacity — occurs ll|, [l6|, |21[ . In a neural coding context, 



the question of 'what is the optimal stimulus distribution?' for a given noise distribution is 

n 

discussed numerically for the SSR model in jig . 



16 



211 ] . by showing that the mu- 



In Sec. (Til we significantly extend the results in fll . 
tual information and output entropy can both be written in terms of simple relative en- 
tropy expressions — see Eqs. (I2T!) and (|22"1) . This leads to a very general sufficient condition, 



Eq. ([25]) . for achieving capacity in the large N regime that can be achieved either by op- 
timizing the signal distribution for a given noise distribution, or optimizing the noise for a 
given signal. Given the neuroscience motivation for studying the SSR model, this result is 
potentially highly significant in computational neuroscience, where both optimal stimulus 
distributions, and optimal tuning curves are often considered 



16 



22]. 



Furthermore, the optimal signal for the special case of uniform noise is shown to be 
the arcsine distribution (a special case of the Beta distribution), which has a relatively large 
variance and is bimodal. This result provides theoretical justification for a proposed heuristic 
method for analog-to-digital conversion based on the SSR model 23J]. In this method, the 
input signal is transformed so that it has a large variance and is bimodal. 

As a means of verification of our theory, in Sec. II I II our general results are compared 
to the specific capacity results contained in [ll, 16, 21]. This leads us to find and justify 
improvements to these previous results. 

Before we proceed however, the remainder of this section outlines our notation, describes 
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the SSR model, and derives some important results that we utilize. 



Information Theoretic Definitions 



Recent work usin g th e SSR model has described performance using measures other than 



mutual information [24], |25|, |26|, |27|, |28|, |29] . However, in line with much theoretical neuro- 
science research [l^j], here we use the information theoretic viewpoint where the SSR model 
can be considered to be a communications channel [8J. 

Throughout, we denote the probability mass function (PMF) of a discrete random vari- 
able, a, as P a {-), the probability density function (PDF) of a continuous random variable, 
/3, as fp(-), and the cumulative distribution function (CDF) of (3 as Fp(-). 

All signals are discrete-time memoryless sequences of samples drawn from the same sta- 
tionary probability distribution. This differs from the detection scenario often considered in 
SR research, in which the input signal is periodic. Such a signal does not convey new infor- 
mation with an increasing number of samples, and cannot be considered from an information 



a. 



theoretic viewpoint 

Consider two continuous random variables, X and Y, with PDFs fx{x) and /y(x), with 
the same support, S. The relative entropy — or Kullback-Liebler divergence — between the 
two distributions is defined as [3C 



Suppose X and Y have joint PDF, fxy{ x i y)- Shannon's mutual information between X and 
Y is defined as the relative entropy between the joint PDF and the product of the marginal 
PDFs H, 



x Jy 



= H{Y) - H(Y\X) bits per sample. (2) 

where H(Y) is the entropy of Y and H(Y\X) is the average conditional entropy of Y given 
X. 

The definition of mutual information also holds for discrete random variables, and for 
one variable discrete and one continuous. The entropy of a discrete random variable, Y, is 
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given by 

N 

H(Y) = -J2Mn)log 2 P Y (n), (3) 

n=0 

while a continuous random variable, X, has differential entropy 

H(X) = -[ f x ( V )\og 2 (fx(v))dv- (4) 

In this paper we are interested in the case of X continuous with support S and Y discrete, 
with N states, in which case the average conditional entropy of Y given X is 

N 

H(Y\X) = - / f x (x)J2 p y\x(n\x)\og 2 (P Ylx (n\x))dx. (5) 
Jxes n=Q 

In information theory, the term channel capacity is defined as being the maximum achiev- 



able mutual information of a given channel 



30] . Suppose X is the source random variable, 



and Y is the random variable at the output of the channel. Usually, the channel is assumed 
to be fixed and the maximization performed over all possible source PDFs, fx{x). The 
channel capacity, C, can be expressed as the optimization problem, 

Find: C= max I(X,Y). (6) 

Usually there are prescribed constraints on the source distribution such as a fixed average 



power, or a finite alphabet 301 ] . In Sec. IHII we will also consider the more stringent constraint 
that the PDF of the source is known other than its variance. In this situation, channel 
capacity is determined by finding the optimal source variance, or as is often carried out in 
SR research, the optimal noise variance. 



B. SSR Model 

Fig. [1] shows a schematic diagram of the SSR model. The array consists of N parallel 
threshold nonlinearities — or 'devices', each of which receive the same random input signal, 
X, with PDF fx{')- The z-th device in the model is subject to continuously valued iid— 
independent and identically distributed — additive random noise, rji (i = 1, .., N), with PDF 
/ r? (-). Each noise signal is required to also be independent of the signal, X. The output of 
each device, yi, is unity if the input signal, X, plus the noise on that device's threshold, rji, is 
greater than the threshold value, 9. The output signal is zero otherwise. The outputs from 
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each device, y^ are summed to give the overall output signal, y = Y^LxVi- This output is 



integer valued, y G [0, ..,N], and is therefore a quantization (digitization) of X [28]. 

The conditional PMF of the output given the input is P y \x(y = n\X — x),n G [0..,iV]. 
We abbreviate this to P y \x{n\x). The output distribution is 

P y {n) = / P y \x{n\x)fx{x)dx neO,..,N. (7) 

J X 

The mutual information between X and y is that of a semi-continuous channel 8|, and 
can be written as 

I(X,y) = H(y)-H(y\X) 

N 

= -J2 P y( n )^g 2 P y (n)- 

poo N \ 

/ fx{%) ^ P y \x{n\x) log 2 P y \x{n\x)dx . (8) 

To progress further we use the notation introduced in js]]. Let P\\ x be the probability of 
the i-th threshold device giving output yi = 1 in response to input signal value, X = x. If 
the noise CDF is F v {-), then 

P 1 \ x = l-F v (9-x). (9) 

n 

As noted in [8(, P y \ x (n\x) is given by the binomial distribution as 

Py\x{n\x) = r\p^ x (l-P Ax ) N - n n<E0,..,N, (10) 
and Eq. ([8]) reduces to 

N / ■ 

P-.(n ) lnpr„ I - ^ 
( n ) 



J(X,y) = -^P»log 2 (^) + 

n=0 



N / / x (x)Pi| x log 2 Pi| :r rfx+ 

N f f x (x)(l - P 1]x ) log 2 (1 - P 1]x )dx. (11) 

J x 

Numerically evaluating Eq. fill I) as a function of input SNR for given signal and noise 
distributions finds that the mutual information has a unimodal stochastic resonance curve 
for N > 1, even when the si gnal and noise are both suprathreshold — i.e. the threshold value, 
8, is set to the signal mean ll|, \24 \. 
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Further analytical simplification of Eq. (jSJ) is possible in the case where the signal and 
noise PDFs are identical with the same variance, i.e. fx{%) = f-q{0 — x) V x [11]. The result 
is 

AT 1 N 

I(X, y) = log 2 (N + 1) - — - Y,(N + l-2n) log 2 n. (12) 

n=2 

What is quite remarkable about this result is that the mutual information is independent 
of the shape of the PDFs of the signal and noise, other than that fx( x ) = — x ) V x. 
This means that both PDFs have the same shape, but may possibly have different means, 
and be mutually reversed along the x-axis about their means. In Sec. Ill Dl we compare the 
mutual information of Eq. (fl2l with our calculations of the general channel capacity. 



C. Describing SSR Using a Single PDF, f Q (r) 

We now show that the mutual information in the SSR model depends solely on N, and 
an auxiliary PDF, /q(-). This PDF is shown to be that of the random variable describing 
the conditional average output of the SSR model, given that the input signal is X = x. 



1. /q(t) as the PDF of the Average Transfer Function 

Although the output of the SSR model, y, is a discrete random variable, the conditional 
expected value of y, given the input is X = x, is a continuous random variable, since X is. We 
label this random variable as Y. Since the PMF of y given X = x is the binomial PMF as in 
Eq. ffTUj) . we know that Y is the random variable that results from y = E[y\X — x] — NPx\ x . 
Inverting this gives x = 6 — F~ l (l — J^) . 

The PDF of Y can be derived from fx(-), since y = NPi\ x provides an invertible trans- 
formation of X, with PDF fx{x), to Y, with PDF fyiy)- Using the well known expression 
for the resultant PDF, and provided the support of fx(%) is contained in the support of 
f n (6 — x) — since otherwise 4? does not necessarily exist — we have 



h{y) =fx 

fx(x 



dx 



dy 



x=8-F, 



NfJ9 - x) 



X =e-F„ 



ye[0,N). 



(13) 
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Our condition regarding the supports of the signal and noise ensures that f v (-) ^ 0. If we 
make a further change to a new random variable, Q, via r = jr, the PDF of Q is 

/q(r)= f ^ (X \ , re [0,1], (14) 

f n {0-X) x=e-F-\l-r) 

and the PDF of Y can be written as 

Mv) = (is) 

which illustrates the physical significance of the auxiliary PDF, Jq(-), as the PDF of j^. 



2. Mutual Information in Terms of /q(t) 

Making a change of variable in Eq. ffTTl) from x to r, via r = Piu = 1 — F v (9 — x) gives 
J(X,y) = -f>,Wlog 2 f^) + 

n=0 \ In/ / 

rr=l 

N / / Q (r)rlog 2 rrfr+ 

Jt=0 

N f T / Q (r)(l-r)log 2 (l-r)dr, (16) 

Jt=0 

where 

p . w = r 1 /qovu - ( i7 ) 



r=0 



Eqs. ( Fl6|) and (JTTJ) show that the PDF /q(t) encapsulates the behavior of the mutual infor- 
mation in the SSR model. 

3. Entropy of the random variable, Q 

If we make a change of variable from r to x, and note that fx(x)dx = /q(t)g?t, the 
entropy of Q can be written as 

H{Q) = - [ f Q (r) log 2 ( f Q (r))dr 
Jo 

= -D{fx(x)\\f v (6 - x)), (18) 
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which is the negative of the relative entropy between the signal PDF, and the noise PDF 
reversed about x = and shifted by 9. In the event that the noise PDF is an even function 
about its mean, and 9 is equal to the signal mean, then the entropy of Q is simply the 
negative of the relative entropy between the signal and noise PDFs. 



4- Examples of the PDF /q(t) 

The PDF /q(t) can be derived for specific signal and noise distributions. Table [I] lists 
r) for several cases where the signal and noise share the same distribution and a mean 
of zero, but with not necessarily equal variances. The threshold value, 9, is also set to zero. 

For each case considered, the standard deviation of the noise can be written as aa v , where 
a is a positive constant, and the standard deviation of the signal can be written aa x . We find 
that /q(t) in each case is a function of a single parameter that we call the noise intensity, 
a = a v /a x . Given this, from Eq. ( fl6|) . it is clear that the mutual information must be a 
function only of the ratio, a, so that it is invariant to a change in a x provided a v changes 
by the same proportion. This fact is noted to be true for the Gaussian case in and the 



uniform case in but here we have illustrated why. 

We note however, that if 9 is not equal to the signal mean, then /q(t) will depend on 
the ratio — , as well as 9 and <r, and therefore so will the mutual information. 

TableU also lists the entropy of Q for three cases where an analytical expression could be 
found. 



D. Large N SSR: Literature Review and Outline of This Paper 

In the absence of noise, the maximum mutual information is the maximum entropy of the 
output signal, log 2 (N +1). It has been shown for very specific signal and noise distributions 



that the mutual information in the SSR model scales with 0.5 log 2 (N) for large N ll|, \21 \. 
This means that the channel capacity for large N under the specified conditions is about 
half the maximum noiseless channel capacity. This situation is discussed in Sec. IIHI 

The only other work to consider SSR in the large N regime finds that t ie optimal noise 
intensity for Gaussian signal and noise occurs for a ~ 0.6 [16J. Unlike 21[ — which uses 
the exact expression of Eq. (jl2]h and derives a large N expression by approximating the 
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summation with an integral- 
to the mutual information. 



16] begins by using a Fisher information based approximation 



In Appendix IA II we re-derive the formula of 16j in a different manner, which results 
in new large N approximations for the output entropy, as well as the mutual information. 
These approximations provide the basis for the central result of this paper, which is a 
general sufficient condition for achieving channel capacity in the SSR model, for any arbitrary 
specified signal or noise distribution. This is discussed in Section [Til These new general 
results are compared with the specific results of 
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M, [2l| in Sec. UTTJ 



II. A GENERAL EXPRESSION FOR THE SSR CHANNEL CAPACITY FOR 
LARGE N 



Fisher information 30J, [3JJ has previously been discussed in numerous papers on both 
neural coding [32| and stochastic resonance [331 ] . and both [3J, [35[. However, most SR 
studies using Fisher information consider only the case where the signal itself is not a 



16 



22 



ormation 



34 



36|. 



random variable. When it is a random variable, it is possible to connect Fisher in : 
and Shannon mutual information under special conditions, as discussed in 

It is demonstrated in that the Fisher information at the output of the SSR model as 
a function of input signal value X = x, is given by 

J ( x ) = ~nr tt^ — ft^- ( 19 ) 



dx 



In 16], Eq. (fl9l) is used to approximate the large N mutual information in the SSR model 
via the formula 



I(X,y) = H(X)-0.5 



fx(x)\og 2 



2ire 



dx. 



' 2 \J{x 

This expression — which is derived under much more general circumstance in [22], [3 



(20) 



-relies 



on an assumption that an efficient Gaussian estimator for x can be found from the output 
of the channel, in the limit of large N. 

In Appendix IA II we outline an alternative derivation to Eq. (|2"01 — from which Eq. ffTU]) 
can be inferred — that is specific to the SSR model, and provides additional justification 
for its large N asymptotic validity. This alternative derivation allows us to find individual 
expressions for both the output entropy and conditional output entropy. This derivation 
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makes use of the auxiliary PDF, /q(t), derived in Sec. II CI The significance of this approach 
is that it leads to our demonstration of the new results that the output entropy can be 
written for large iV as 

H(y) ~ log 2 (AT) - D(f x {x)\\f v {e - x)), (21) 
while the mutual information can be written as 

I(X,y)~0.5\ogJ^)-D(f x \\f s ), (22) 
where fs{') is a PDF known as Jeffrey's prior, 



fs(x) = (23) 

7TV A 



It is proven in Appendix IA 21 that for the SSR model Eq. (123]) is indeed a PDF. This is 
a remarkable result, as in general Jeffrey's prior has no such simple form. Substitution of 
Eq. f[2"3"j) into Eq. f[2"2"l) and simplifying leads to Eq. (T2"U1) . which verifies this result. 

By inspection of Eq. ffl9l) . fs(x) can be derived from knowledge of the noise PDF, f^T)), 
since 

fs(x) = M -*) (24) 

n^/F ri (d-x)(l-F n (9-x)) 

A. A Sufficient Condition for Optimality 

Since relative entropy is always non-negative, from Eq. f[2"2"j) a sufficient condition for 
achieving the large A" channel capacity is that 

fx{x) = fs{x) Wx, (25) 

with the resultant capacity as 

C(X, y) = 0.5 log 2 f — J ~ 0.5 log 2 A^ - 0.3956. (26) 

Eq. (1261) holds provided the conditions for the approximation given by Eq. (T2"0"|) hold. Other- 
wise, the RHSs of Eqs. (T2"T|) and (1221) give lower bounds. This means that for the situations 



considered previously in [161 . |21| where the signal and noise both have the same distribution 
(but different variances), we can expect to find channel capacity that is less than or equal 
to that of Eq. ([26]) . This is discussed in Sec. IIHI 
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The derived sufficient condition of Eq. (125]) leads to two ways in which capacity can be 
achieved, (i) an optimal signal PDF for a given noise PDF, and (ii) an optimal noise PDF 
for a given signal PDF. 



B. Optimizing the Signal Distribution 

Assuming Eq. ( 1201) holds, the channel capacity achieving input PDF, fxi x )i can De found 
for any given noise PDF from Eqs. (1241) and ( 1251) as 



f° (x) = M= ^ (27) 

ny/F v (e-x)(l-F v (e-x)y 

1. Example: Uniform Noise 

Suppose the iid noise at the input to each threshold device in the SSR model is uniformly 
distributed on the interval [— cr v /2, <J v /2] so that it has PDF 

/,(£) = —, £e[-<r n /2,a n /2]. (28) 

Substituting Eq. ( 1281) and its associated CDF into Eq. ( 1271) . we find that the optimal signal 
PDF is 

f° x ( x ) = ^= 1 xe[e-a v /2,e + a v /2}. (29) 

2 



This PDF is in fact the PDF of a sine- wave with uniformly random phase, amplitude a v /2, 
and mean 9. A change of variable to the interval r G [0, 1] via the substitution r = 
(x — 0)/a r] + 0.5 results in the PDF of the Beta distribution with parameters 0.5 and 0.5, 
also known as the arcsine distribution. As mentioned in Sec. [U this result provides some 



3. 



theoretical justification for the analog-to-digital conversion method proposed in 

This Beta distribution is bimodal, with the most probable values of the signal those near 
zero and unity. Similar results for an optimal input distribution in an information theoretic 



optimization of a neural system have been found in 



38| . These results were achieved numer- 



ically using the Blahut-Arimoto algorithm often used in information theory to find channel 
capacity achieving source distributions, or rate-distortion functions [sol ]. 
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2. Gaussian Noise 



Suppose the iid noise at the input to each threshold device has a zero mean Gaussian 
distribution with variance a 2 ,, with PDF 

m) = -^w(-£s\ (3°) 



Substituting Eq. (1311 and its associated CDF into Eq. (127)1 . gives the optimal signal PDF. 
The resultant expression for f x ( x ) does not simplify much, and contains the standard error 
function, erf(-) [39]. 

We are able to verify that the resultant PDF has the correct shape via Fig. 8 in 16], 
which presents the result of numerically optimizing the signal PDF, fx(x), for unity variance 
zero mean Gaussian noise, 9 = 0, and iV = 10000. As with the work in [38|, the numerical 
optimization is achieved using the Blahut-Arimoto algorithm. It is remarked in 16j that 
the optimal fx{ x ) is close to being Gaussian. This is illustrated by plotting both fx( x ) an d 
a Gaussian PDF with nearly the same peak value as fx( x )- It is straightforward to show 
that a Gaussian with the same peak value as our analytical f x { x ) has variance 0.257T 2 . If 
the signal was indeed Gaussian, then we would have a = 2/n ~ 0.6366, which is very close 
to the value calculated for actual Gaussian signal and noise in Sec. IIHI 

Our analytical fx( x ) from Eqs. (130p and (1271) . with 9 = 0, is plotted on the interval 
x G [—3, 3] in Fig. [2J along with a Gaussian PDF with variance 0.257T 2 . Clearly the optimal 
signal PDF is very close to the Gaussian PDF. Our Fig. [2] is virtually identical to Fig. 8 
in 16|. It is emphasized that the results in 16j were obtained using an entirely different 



method that involves numerical iterations, and therefore provides excellent validation of our 
theoretical results. 



C. Optimizing the Noise Distribution 

We now assume that the signal distribution is known and fixed. We wish to achieve 
channel capacity by finding the optimal noise distribution. It is easy to show by integrating 
Eq. (121)) that the CDF corresponding to the PDF, fs(-), evaluated at x, can be written in 
terms of the CDF of the noise distribution as 

F s (x) = l-^ arcsin [Jf~(6 - x)^j . (31) 
13 



If we now let fx(x) = fs{%), then Fx(s) = Fs(x), and rearranging Eq. (I3TI) gives the optimal 
noise CDF in terms of the signal CDF as 

F°(x) = sin 2 F x (e-x))j = 0.5 + 0.5 cos (ixF x (9 - x)). (32) 

Differentiating F°(x) gives the optimal noise PDF as a function of the signal PDF and CDF, 

f°{x) = \ sin (tt(1 -F x (9- x)))f x {6 - x). (33) 

Unlike optimizing the signal distribution, which is the standard way for achieving channel 
capacity in info— theory Q, we have — a signal distribution, and found the 
'best' noise distribution, which is equivalent to optimizing the channel, rather than the 
signal. 



1. Example: Uniform Signal 

Suppose the signal is uniformly distributed on the interval x G [—a x /2,a x /2]. From 
Eqs. ( 132]) and fl33|) . the capacity achieving noise distribution has CDF 

F°{x) = 0.5 + 0.5 sin J , x E [9 — u x /2, 9 + u x /2] (34) 



and PDF 



'' 2cr x V o~ x 



f>) = ^- cos ( ^ '- ) , x E [9 - a x /2, 9 + a x /2]. (35) 



Substitution of F°(x) and f° (x) into Eq. (fl9l) finds the interesting result that the Fisher 
information is constant for all x, 



J{x) = N 7 ^. (36) 



This is verified in Eq. f l37|) below 



D. Consequences of Optimizing the Large N Channel Capacity 

1. Optimal Fisher Information 

Regardless of whether we optimize the signal for given noise, or optimize the noise for a 
given signal, it is straightforward to show that the Fisher information can be written as a 
function of the signal PDF, 

J(x) = N^{f x {x))\ (37) 
14 



Therefore, the Fisher information at large N channel capacity is constant for the support 
of the signal iff the signal is uniformly distributed. The optimality of constant Fisher infor- 
mation in a neural coding context is studied in 32]. 



2. The Optimal PDF / Q (r) 

A further consequence that holds in both cases is that the ratio of the signal PDF to the 
noise PDF is 

/*(.) _ 2 (38) 



f„(e-x) 7rsin(ir(l-F*(x))) 
This is not a PDF. However, if we make a change of variable via r = 1 — F v {9 — x) we get 
the PDF Jq(t) discussed in Sec. II C\ which for channel capacity is 

fi{r) = —=±==, re [0,1]. (39) 

This optimal /q(t) is in fact the PDF of the beta distribution with parameters 0.5 and 0.5, 
i.e. the arcsine distribution. It is emphasised that this result holds regardless of whether 
the signal PDF is optimised for a given noise PDF or vice versa. 



3. Output Entropy at Channel Capacity 

From Eq. ffT8]) . the entropy of Q is equal to the negative of the relative entropy between 
fx{x) and f v (9 — x). The entropy of Q when capacity is achieved can be calculated from 
Eq. ( |39l) using direct integration as 

H{Q) = log 2 (tt) - 2. (40) 

From Eqs. (1211 and ffT8l) . the large iV output entropy at channel capacity in the SSR model 
is 

H(y) = log 2 f^j . (41) 

4- The Optimal Output PMF is Beta-Binomial 

Suppose we have signal and noise such that /q(t) = /q(t) — i.e. the signal and noise 
satisfy the sufficient condition, Eq. (125]) — but that N is not necessarily large. We can derive 
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the output PMF for this situation, by substituting Eq. (|39|) into Eq. (1TT1) to get 

Py (n) = (£) - / 1 r (-0-S)(l _ r) (^V-n-0.5) rfr 



,^ /?(n + 0.5,iV-n + 0.5) 

UJ £(0.5,0.5) ' 1 ' 



where [3(a, b) is a Beta function. This PMF can be recognized as that o 
or negative hypergeometric — distribution with parameters N, 0.5, 0.5 
that Eq. (1421) holds as an exact analytical result for any N. 



the Beta-binomial — 
4(3] . It is emphasized 



5. Analytical Expression for the Mutual Information 

The exact expression for the output PMF of Eq. ( 1421) allows exact calculation of both the 
output entropy, and the mutual information without need for numerical integration, using 
Eq. (ITB"]) . This is because when /q(t) = /q(t), the integrals in Eq. (|T6l) can be evaluated 
exactly to get 

hi*, v) = ~ E p y^ ( W) + N (i) • ( 43 ) 

n=0 \ Vra/ / 

The exact values of I a (X,y) and the corresponding output entropy, H a (y), are plotted in 



Fig. 3(a) for N = 1, .., 1000. For comparison, the exact I(X,y) of Eq. ( fT2l) . which holds for 
fx(x) = / r? (6 l — x), is also plotted, as well as the corresponding entropy, H{y) = log 2 (N + 1). 
It is clear that I (X, y) is always larger than the mutual information of the fx{%) — fr){@ — %) 
case, and that H a (y) is always less than its entropy, which is the maximum output entropy. 
To illustrate that the large N expressions derived are lower bounds to the exact formula 



plotted in Fig. 3(a) , and that the error between them decreases with N, Fig. 3(b) shows the 
difference between the exact and the large iV mutual information and output entropy. This 
difference clearly decreases with increasing N. 

E. A Note on the Output Entropy 



The SSR model has been described in terms of signal quantization theory in 28|, and 



compared with the related process of companding in 41] . In this context quantization means 



the conversion of a continuously valued signal to a discretely valued signal that has only 
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a finite number of possible values. Quantization in this sense occurs in analog-to-digital 
converter circuits, lossy compression algorithms, and in histogram formation [42|. For a 
deterministic scalar quantizer with N + 1 output states, N threshold values are required. In 
quantization theory, there is a concept of high resolution quantizers, in which the distribution 
of N — > oo threshold values can be described by a point density function, X(x). For such 
quantizers, it can be shown that the quantizer output, y, in response to a random variable, 
X, has entropy H(y) ~ log 2 N — D(fx\\X) 42j. This is strikingly similar to our Eq. fl2T|) 
for the large N output entropy of the SSR model. In fact, since the noise that perturbs 
the fixed threshold value, 6, is additive, each threshold acts as an iid random variable with 
PDF f v {9 — x), and therefore for large N, f v (9 — x) acts as a density function describing 
the relative frequency of threshold values as a function of x, just as A(x) does for a high 
resolution deterministic quantizer. 

For deterministic quantizers, the point density function can be used to approximate 
the high resolution distortion incurred by the quantization process. For the SSR model 
however, since the quantization has a random aspect, the distortion has a component due 
to randomness as well as lossy compression, and cannot be simply calculated from f v (-). 
Instead, one can use the Fisher information to calculate the asymptotic mean square error 
distortion, which is not possible for deterministic high resolution quantizers. 



III. CHANNEL CAPACITY FOR LARGE N AND 'MATCHED' SIGNAL AND 
NOISE 

Unlike the previous section, we now consider channel capacity under the constraint of 
'matched' signal and noise distributions — i.e. where both the signal and noise, while still 
independent, have the same distribution, other than their variances. The mean of both 
signal and noise is zero and the threshold value is also 8 = 0. In this situation the mutual 
information depends solely on the ratio a = c^/cx, which is the only free variable. Finding 
channel capacity is therefore equivalent to finding the optimal value of noise intensity, a. 
Such an analysis provides verification of the more general capacity expression of Eq. ( l26l ). 
which cannot be exceeded. 

Furthermore, inspection of Eq. (1A10I) shows that the large N approximation to the mutual 
information consists of a term that depends on N and a term that depends only on a . This 
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shows that for large N the channel capacity occurs for the same value of a — which we denote 
as a Q — for all N. 



This fact is recognized in both [2l| for uniform signal and noise — where a Q — > 1 — and [16| , 
for Gaussian signal and noise. Here, we investigate the value of a Q and the mutual infor- 
mation at a for other signal and noise distributions, and compare the channel capacity 
obtained with the case where fx(x) = fs{%)- This comparison finds that the results of 16] 
overstates the true capacity, and that large N results in [ll|, Q need to be improved to be 
consistent with the central results of this paper. 

From Eq. ( 1221) . channel capacity for large N occurs for the value of a that minimizes the 
relative entropy between f x and fs- If we let 

/(*) = f " fx(x) In dx, (44) 



J(x 

then from Eq. (I20p . it is also clear that this minimization is equivalent to solving the following 
problem, 

a = min/(cr). (45) 
This is exactly the formulation stated in 



161 ] . Problem (jl5|) can be equivalently expressed 



as 



a = mmlf(a) = D(f x \\f, n )+ f x (x) log 2 (P 1[x )dx L (46) 

where we have assumed that both the signal and noise PDFs are even functions. The 
function f(a) can be found for any specified signal and noise distribution by numerical 
integration, and Problem (|46j) easily solved numerically. If an exact expression for the 
relative entropy term is known, then only g(a) = [~™ fx(x) log 2 (Pi\ x )dx needs to be 
numerically calculated. 

Table HT1 gives the result of numerically calculating the value of a a , and the corresponding 
large N channel capacity, C(X,y), for a number of distributions. In each case, C(X,y) — 
0.51og 2 (iV) < —0.3956, as required by Eq. (|26l) . The difference between capacity and 
0.5 log 2 (N) is about 0.4 bits per sample. In the limit of large N, this shows that capacity 
is almost identical, regardless of the distribution. However, the value of a a at which this 
capacity occurs is different in each case. 

As discussed in Sec. II Bl the mutual information is identical whenever the signal and noise 
PDFs are identical, i.e. a = 1. It is shown below in Eq. ( )48l) that for large N the mutual 
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information at a = 1 is I(X,y) = 0.51og 2 (A) — 0.6444. Given that the channel capacity is 
slightly larger than this, as indicated by Table [Til f° r each case there is a constant difference 
between the channel capacity and the mutual information at o = 1. This value is also listed 
in Table QH 

A. Improvements to Previous Large N Approximations 

We now use the results of Sec. HUto show that previous large A expressions for the mutual 
information in the literature for the a — 1, Gaussian and uniform cases can be improved. 

1. SSR for Large N and a = 1 



We now consider the situation where fx( x ) = fr]{ x )i so that a = 1. It is shown in 11] 
that in this A approaches infinity, Eq. (fl2j) reduces to 

/(X, y) ~ 0.5 log 2 y—p^j - °- 5 lo S2 (N + 1) - 0.7213. (47) 

To show that this expression can be improved, we begin with the version of Eq. (1201) given 
by Eq. flATOl) . When a = 1 we have f Q (r) = 1 and H(Q) = 0. The integrals in Eq. flATOj) 
can be solved to give the large N mutual information at a = 1 as 

/(X, y) ~ 0.5 log 2 — -0.5 log 2 N - 0.6044. (48) 

Although Eqs. (147]) and (j4"8"|) agree as N — ► oo, the constant terms do not agree. It is shown 



in Appendix IA 31 that the discrepancy can be resolved 



to the average conditional entropy, H(y\X), made in The output entropy at a — 1 



improving on the approximation 



can be shown to be simply H(y) = log 2 (N + 1) 111 ]. Subtracting Eq. ( 1A18I) from H(y) and 
letting N approach infinity gives 

I(X,^0.51og 2 (^±^\ (49) 



2tt 

which does have a constant term which agrees with Eq. fj4"8j) . The explanation of the dis- 



crepancy is that [ll|] uses the Euler-Maclaurin summation formula to implicitly calculate 
log 2 (A!) in the large N approximation to H(y\X). Using Stirling's approximation for A!, 
as done here, gives a more accurate approximation. 



19 



The increased accuracy of Eq. (|48j) can be verified by numerically comparing both Eq. (|48|) 
and Eq. f|47|) with the exact expression for I(X,y) of Eq. ffl2l) . as A" increases. The error 
between the exact expression and Eq. (T48|) approaches zero as N increases, whereas the error 
between Eq. ffT21) and Eq. (|47|) approaches a nonzero constant for large iV of 0.5 log 2 ffQ — 
0.117 bits per sample. 



2. Uniform Signal and Noise 



noise and a < 1. In addition, 



A derivation is given in 2llj of an exact expression for I(X,y) for uniform signal and 



2l| finds a large A" approximation to the mutual information. 



Using the same arguments as for the a = 1 case, this approximation can be improved to 

I(X,y) ~ >g 2 (^^) +(l-a)(l- log 2 (1 - a)) - alog 2 (a). (50) 



2 oz V 2tt 



The accuracy of Eq. (1501) can be verified by numerical comparison with the exact formula 



in 



211 ]. as A" increases. If one replicates Fig. 3 of 21| in this manner, it is clear that Eq. fl50|) 
is the more accurate approximation. 

Differentiating Eq. floUl) with respect to o and setting to zero obtains the optimal value 



of a as 

The channel capacity at a Q is 



*.= ^ TY > . (51) 



C(X, y) — 1 — log 2 (1 - <t„) = log 2 f 2 + ^ (JV 2 ^ 2)e j ■ (52) 

Clearly, liniAr^ooCTo = 1, and the capacity approaches 0.51og 2 ((AT + 2)e/(27r)), which agrees 
with Eq. (jlSD- Expressions for a Q and the corresponding capacity for large Af are also given 
in 2l|. Again, these are slightly different to Eqs. (15"T|) and (152"|) . due to the slightly inaccurate 
terms in the large A^ approximation to H(y\X). However the important qualitative result 
remains the same, which is that the channel capacity scales with 0.51og 2 (N) and the value 
of a which achieves this asymptotically approaches unity. 
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3. Gaussian Signal and Noise 



In 



161 ]. an analytical approximation for o Q for the specific case of Gaussian signal and 



noise is derived using a Taylor expansion of the Fisher information inside the integral in 
Eq. (|2"Uj) . We give a slightly different derivation of this approach that uses the PDF /q(t). 

We begin with Problem fj4"6l) . Solving this problem requires differentiating f(a) with 
respect to a and solving for zero. From Table [U the derivative of the relative entropy 
between fx and f v is 

For the second term, g(a), we take the lead from [16| and approximate In (Pi\ x ) by its second 
order Taylor series expansion 39]. The result is that 

9(v) = - / fx(x) log 2 (P 1{x )dx ~ 1 + — (54) 



Numerical testing finds that the approximation of Eq. f[54|) appears to be quite accurate for 
all a, as the relative error is no more than about 10 percent for a > 0.2. However, as we will 
see, this is inaccurate enough to cause the end result for the approximate channel capacity 
to significantly overstate the true channel capacity. 

Taking the derivative of Eq. ( 1541) with respect to a, subtracting it from Eq. ( 1531) . setting 
the result to zero and solving for o gives the optimal value of a found in 16], o ~ y^l — | ~ 
0.6028. 

An expression for the mutual information at a Q can be found by back-substitution. Car- 
rying this out gives the large N channel capacity for Gaussian signal and noise as 

C(X,y)^0.51og 2 (^^), (55) 

which can be written as C(X, y) ~ 0.5 log 2 iV — 0.3169. 

Although Eq. fl55l) is close to correct, recall from Sec. [Til that capacity must be less than 
0.51og 2 iV — 0.3956 and hence Eq. (155]) significantly overstates the true capacity. 
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APPENDIX A: DERIVATIONS 

1. Mutual Information for Large N and Arbitrary a 

This appendix contains derivations of the large N approximations to the output entropy 
and mutual information discussed in Sec. [Til 

a. Conditional Output Entropy 

An approximation to the conditional output entropy, H(y\X), can be derived by noting 
that for large N the binomial distribution can be approximated by a Gaussian distribution 
with the same mean and variance — i.e. NP\\ X and NP\\ X (\ — P\\ x ) respectively. Provided 
0<AP!| x <A we have 



N 

H(y\x) = -J2 p y\x(n\x) log 2 (P v{x (n\x)). (A2) 

n=0 

Using the well known result for the entropy of a Gaussian random variable [sfj we can write 




(Al) 



The average conditional output entropy is H(y\X) = J fx{x)H(y\x)dx, where 



H{y\x) ~ 0.5 log 2 {2>*eNP x \ x {l - P 1{x j). 



(A3) 



Multiplying both sides of Eq. (IA3I) by fx{%) and integrating over all x gives 




(A4) 
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Eq. (1A4j) can be verified for the case where fx(x) = f v (9 — x), since this means /q(t) = 
1 and J^~q /q(t) log 2 (r)dr = — log 2 (e). Consequently Eq. flA4j) reduces to H(y\X) ~ 
0.51og 2 (^f) which agrees precisely with Eq. flA19h . This approximation breaks down 
when P\\ x is close to zero or unity. Furthermore, Eq. ( ]A4j) holds exactly only for values 
of x for which P y \x(n\x) is exactly Gaussian. Otherwise, H(y\X) is strictly less than the 
approximation given. 



b. Output Distribution and entropy 

For large N, since P y \x{n\x) is Gaussian, y/N approaches a delta function located at 
Pi\x — n/N. From Eqs. ([7]) and (|T7|) . this means that P y (n) can be written in terms of the 
PDF of the average transfer function, /q(-) ; as 

P y (n) c (A5) 



221. 



This result can be derived more rigorously using saddlepoint methods 

Consider the case where the signal and noise both have the same distribution but different 
variances. When the noise intensity, a > 1, then /q(0) = /q(1) = 0, whereas for a < 1, 
we have /q(0) = /q(1) = oo. From Eq. flA5j) . this means P y (0) and P y (N) are either 
zero or infinite. However, for finite N, there is some finite nonzero probability that all 
output states are on or off. Indeed, at a — 1, we know that P y {n) = V n, and at 
a = 0, P y (0) = P y {N) = 0.5. Furthermore, for finite N, Eq. (1A5I) does not guarantee that 
Yln=oPy( n ) = 1- To increase the accuracy of our approximation by ensuring P y (0) and 
P y {N) are always finite, and that P y {n) forms a valid PMF, we define a new approximation 
as 

f -^fi- for n = l,..,N-l 

P »={^f, WV-l/c^)\ P „ „ ( A6 ) 



O-s(l-ES^) for n = 0,n = iV. 



Fig. H] shows that the approximation given by P' y {n) is highly accurate for as small as 63, 
for a both smaller and larger than unity. 
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Consider the entropy of the discrete random variable y. Making use of Eq. (1A5j) . we have 

N 

H(y) = -Y,Py(n)\og 2 (P y (n)) 

n=0 



N 

n=0 



n=0 

Suppose that the summations above can be approximated by integrals, without any 
remainder terms. Carrying this out and then making the change of variable r = n/N gives 

H(y)^\og 2 N- [ T / (r)log a (/o(r))dr 

= log 2 iV + tf(Q), (A8) 

where H(Q) is the differential entropy of the random variable Q. Performing a change of 
variable in Eq. ( 1A8I) of r = 1 — — x) gives 

ffft/) ~ log 2 (N) - D(f x {x)\\f v {e - x)). (A9) 

This result shows that H(y) for large is approximately the sum of the number of output 
bits and the negative of the relative entropy between fx and f v . Therefore, since relative 
entropy is always non- negative, the approximation to H(y) given by Eq. ( 1A9I) is always less 
than or equal to log 2 {N). This agrees with the known expression for H(y) in the specific 
case of a = 1 of log 2 (N + 1), which holds for any N. 

c. Mutual Information 

Subtracting Eq. ( 1A4I) from Eq. (1A8I) gives a large A^ approximation to the mutual infor- 
mation as 

J(X,y)~0.51og 2 (^-) +H(Q) 



2vre 



pT=l 

-0.5/ f Q (r)\og 2 (r(l-r))dr. (AlO) 

Jt=0 

As discussed in the main text, the mutual information scales with 0.51og 2 (A^). The im- 
portance of the AMndependent terms in Eq. flAlOj) is that they determine how the mutual 



information varies from 0.51og 2 (^) for different PDFs, /q(t). 
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Fig. [5] shows, as examples, the approximation of Eq. (lAlOj) . as well as the exact mutual 
information — calculated by numerical integration — for the Gaussian and Laplacian cases, 
for a range of a and increasing N. As with the output entropy, the mutual information 
approximation is quite good for o > 0.7, but worsens for smaller a. However, as A" increases 
the approximation improves. 

Eq. (lAlOj) can be rewritten via the change of variable, x = 9 — — t), as 

( N 

J(X, 2/ ) = 0.51og 2 



2ne 



f " f x (x) log 2 (P 1[x (l - P l[x ))dx - D(f x {x)\\f v (9 - x)). (All) 

J x=— 00 

Rearranging Eq. (1A1 1[) gives Eq. (|2"U|) — with the Fisher information, J(x), given by 
Eq. (|T9l) — which is precisely the same as that derived in [3| as an asymptotic large A^ 
expression for the mutual information. Our analysis extends [3] by finding large A^ ap- 
proximations to both H(y) and H(y\X), as well as the output distribution, P y {n). We have 
also illustrated the role of the PDF, /q(t), in these approximations, and justified the use of 
Eq. (JUD for the SSR model. 



2. Proof that f s (x) is a PDF 

As shown in fig , the Fisher information for the SSR model is given by Eq. ( [191 . Consider 
fs(x) as in Eq. ff2^|) . Since f v (x) is a PDF and F v (x) is the CDF of rj evaluated at x, we 
have fs(x) >0Vi. Letting h(x) = F v (9 — x), Eq. (|24|) can be written as 

f s (x) = - ~ k ' {x) (A12) 
7ry h(x) — h(x) 2 

Suppose f v (x) has support x G [—a, a]. Integrating fs{x) over all x gives 

js\x)dx = / — ax 

-a A=-a TT^h(x) - h(x) 2 



-— ^2arcsin (^Jh(x) 



I x=a 
\x=— a 



2 

(arcsin(O) — arcsin(l)) = 1, (A13) 

7T 



which means fs{x) is a PDF. 
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3. H(y\X) for large N and a = \ 



Here we derive a large N approximation to H(y\X) used in Sec. IIII A II For a = 1 the 
output PMF is P y (n) = t+t V n 



yyj n+1 

N 



Using this, it can be shown that 

N 



N 



n 



log 2 (TV!) 



N + 



, n. 



(A14) 



n=0 x ' n=l 

We will now see that both terms of Eq. ( ]A14j) can be simplified by approximations that hold 
for large N. Firstly, for the log 2 (iV!) term, we can make use of Stirling's formula [39[], which 
is valid for large N, 

N\ ~ y/(2nN)N N exp (-N). (A15) 

This approximation is particularly accurate if the log is taken of both sides, which we require. 
Secondly, the sum in the second term of Eq. (IA140 can be approximated by an integral and 
simplified by way of the Euler-Maclaurin summation formula 

N 



N + 



1 - ^ nlog 2 n~ N log 2 (N + 1) - „ N ^^t . x + O 



391 ] . The result is 
\ogN s 



n=l 



21n2(JV+ 1) 



N 



(A16) 



Subtracting Eq. ( 1A16I) from the log of Eq. ( 1A15I) gives 



N 



n=0 



N 



n 



0.51og 2 - 



N 



21n2 



N + 2 
N+1 



+ 0.51og 2 (2?r) -O 



\ogN 
N 



(A17) 



where we have used ATlog 2 (1 + —) = ^ + O (^). When Eq. (IA17I) is substituted into an 



exact expression for H(y\X) given in 

N 



11 



we get 



N A /JV\ 

n— n \ / 



n=0 

~ 0.51og 2 N + 0.5 
0.51og 2 (2tt) -O 
The final result is that for large N, 

#(y|X)~0.51og 2 



iV 



N+1 
logN 

N 



2 log, (e)+ 



(A18) 



2ttA^ 



(A19) 
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FIG. 1: The SSR model consists of N parallel threshold devices, each with the same threshold 
value, 0. The common input signal is a continuously valued random signal, X, consisting of a 
sequence of discrete time uncorrelated samples. Each device receives independently noisy versions 
of X. The noise signals, rji, are iid additive random signals that are independent of X. The output 
from the i-th device, m, is unity if X + rji > and zero otherwise. The overall output, y, is the 
sum of the individual outputs, yt. 
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FIG. 2: The optimal signal PDF, f^(x), for zero mean, unity variance Gaussian noise, and thresh- 
old value 9 = 0, as obtained from Eq. ()27|) . Superimposed is a Gaussian PDF with the same peak 
value as fxi x )-> so that it has variance 0.25-7T 2 . This figure uses our new theoretical results to 



analytically replicate Fig. 8 in [10], which was calculated numerically. 
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FIG. 3: (a) Exact expressions obtained using /q(t), for I Q (X,y), and H (y), as well as the exact 
mutual information and output entropy when fx(x) = frj{0 — x) (denoted as a = 1), as a function 
of N. (b) The difference between the exact expressions for I (X, y), H a (y) and I(X, y) for fx{x) = 
f^O — x), and the corresponding large N expressions given by Eqs. (1221) . (|41j) and (J49]). 
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(a) a = 0.4, Gaussian signal and noise 



(b) er = 1.6, Gaussian signal and noise 



FIG. 4: Approximation to the output PMF, P y (n), given by Eq. (|A6p . for N = 63. Circles indicate 
the exact P y {n) obtained by numerical integration and the crosses show approximations. 
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FIG. 5: Large N approximation to mutual information given by Eq. (|A10j) and exact mutual 
information calculated numerically. The exact expression is shown by thin solid lines, and the 
approximation by circles, with a thicker solid line interpolating between values of a as an aid 
to the eye. The approximation can be seen to always be a lower bound on the exact mutual 
information. 
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TABLE I: The auxiliary PDF, /q(t), for five different 'matched' signal and noise distributions 
(i.e. same distribution but with different variances), as well as H(Q), the entropy of /q(t). The 
threshold value, 9, and the signal and noise means are assumed to be zero, so that these results are 
independent of 9. The noise intensity, a = a v /a x , is the ratio of the noise standard deviation to the 
signal standard deviation. For the Cauchy case, a\ is the ratio of the full-width-at-half-maximum 
parameters. The label 'NAS' indicates that there is no analytical solution for the entropy. 



Distribution 


f Q (r) 


#(Q) 


Gaussian 
Uniform, a > 1 

Laplacian 

Logistic 
Cauchy 


crexp((l-(T 2 ) (err 1 (2r- l)) 2 ) 
f (7, _X +0 .5< r < _L + . 5) 

0, otherwise, 
f a(2r)^-^ for < r < 0.5, 

< 

( <j(2(l - T))( ff-1 ) for 0.5 < r < 1. 

(t(1-t))(-D 
{T o + {l _ T yf 

1+tan 2 (tt(t-0.5)) 
°~ A (l+al tan 2 (tt(t-0.5))) 


-^2(^-21^(^-1) 

log 2 a 

" log 2 (a) " 21b " 1) 

NAS 
NAS 



34 



II: Large N channel capacity and optimal a for 'matched' signal and 



Distribution 


C(X,y)-0.51o£ 


12 (N) 




C(X,y) - I(X,y)\ a=1 


Gaussian 


-0.3964 




0.6563 


0.208 


Logistic 


-0.3996 




0.5943 


0.205 


Laplacian 


-0.3990 




0.5384 


0.205 
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